Real-Time Info Midterm Assignment¶

Group members: Purva Kapshikar, Hamzah Yaacob, and Jackson Zeng¶

In [44]:
# todo: 
# make this more of a narrative
# fill out background, data sources (e.g. give their headways)
# jackson to edit bus stop delays map to have hover, and put here
# jackson to make bus lines map

Research question¶

Which bus line among the AC Transit 19, 20, and 51A would benefit most from real-time information displays at some of their bus stops?

Background¶

Data sources¶

  • US Census and ACS estimates
  • MTC 511 Open Data (real-time delays)
  • AC Transit on-time performance data
In [38]:
import numpy as np
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import contextily as cx
import plotly.express as px

Map of bus lines and bus stops¶

In [3]:
# purva map

Analysis of surrounding Census Tracts¶

We downloaded Census data on demographics, housing, and transportation characteristics from Social Explorer and used geopandas's plot function to make choropleth maps.

In [4]:
census_tracts_data = pd.read_csv("data/census/census_tracts_data.csv")
In [5]:
census_tracts_data = census_tracts_data.dropna(axis=1,how="all")

We first convert the Social Explorer column names to more comprehensible ones.

In [6]:
columns_to_keep = ['Geo_FIPS',
                   'SE_A03001_001',
                   'SE_A03001_002',
                   'SE_A03001_003',
                   'SE_A03001_005',
                   'SE_A01001_011',
                   'SE_A01001_012',
                   'SE_A01001_013',
                   'SE_A14001_001',
                   'SE_A14001_002',
                   'SE_A14001_003',
                   'SE_A14001_004',
                   'SE_A14001_005',
                   'SE_A14001_006',
                   'SE_A14001_007',
                   'SE_A14001_008',
                   'SE_A09005_001',
                   'SE_A09005_003']
census_tracts_data = census_tracts_data[columns_to_keep]
In [7]:
census_tracts_data.columns = ['FIPS',
                              'Total Population',
                              'White Alone',
                              'Black or African American Alone',
                              'Asian Alone',
                              '65 to 74 years',
                              '75 to 84 years',
                              '85 years and above',
                              'Households',
                              'Less than $10,000',
                              '$10,000 to $14,999',
                              '$15,000 to $19,999',
                              '$20,000 to $24,999',
                              '$25,000 to 29,999',
                              '$30,000 to $34,999',
                              '$35,000 to $39,999',
                              'Workers 16 years and over',
                              'Public Transportation (Includes Taxicab)']
In [8]:
geom_census_data = pd.read_csv('data/census/bus_route_tracts.csv')
In [9]:
geom_census_data = geom_census_data.rename(columns = {"GEOID":"FIPS"})
geom_census_data = geom_census_data[["FIPS", "geometry"]]

We have two datasets, one with the Social Explorer data, and one that has the tract geometries. We merge these on the FIPS column.

In [10]:
joined_census_data = census_tracts_data.merge(right=geom_census_data, on="FIPS")

As there is now a geometry column, we convert this dataframe to a geodataframe.

In [11]:
joined_census_data["geometry"] = gpd.GeoSeries.from_wkt(joined_census_data["geometry"])
joined_census_data = gpd.GeoDataFrame(joined_census_data, geometry="geometry")

We create variables for percent Black, percent Asian, percent low income, and percent taking transit using the Social Explorer data, as we believe this is easier to understand than actual counts through choropleth maps. We also use the contextily library for their basemaps -- here we've used CartoDB's Positron basemap. We also modify the alpha value so that the tracts are more transparent.

In [12]:
joined_census_data['PCT_Black'] = joined_census_data['Black or African American Alone']/joined_census_data['Total Population']*100
In [13]:
ax_bl = joined_census_data.plot(figsize=(12,10),
                         column='PCT_Black',
                         legend=True,
                         cmap='OrRd',
                         scheme='equal_interval',
                         linewidth=0.2,
                         edgecolor='grey',
                         alpha=0.8,
                         legend_kwds={
                              'loc': 'upper right',
                              'bbox_to_anchor':(1,1),
                              'title': '% Black'
                         })
cx.add_basemap(ax_bl, source=cx.providers.CartoDB.Positron)
plt.title(label="Surrounding tracts by percentage Black population",
          fontsize=14,
          color="black");
In [14]:
joined_census_data['PCT_Asian'] = joined_census_data['Asian Alone']/joined_census_data['Total Population']*100
In [15]:
ax_as = joined_census_data.plot(figsize=(12,10),
                         column='PCT_Asian',
                         legend=True,
                         cmap='OrRd',
                         scheme='equal_interval',
                         linewidth=0.2,
                         edgecolor='grey',
                         alpha=0.8,
                         legend_kwds={
                              'loc': 'upper right',
                              'bbox_to_anchor':(1,1),
                              'title': '% Asian'
                         })
cx.add_basemap(ax_as, source=cx.providers.CartoDB.Positron)
plt.title(label="Surrounding tracts by percentage Asian population",
          fontsize=14,
          color="black");

We considered low-income households as those with incomes less than than $40,000.

In [16]:
joined_census_data['Low_Income'] = joined_census_data['Less than $10,000'] + joined_census_data['$10,000 to $14,999'] +joined_census_data['$15,000 to $19,999'] +  joined_census_data['$20,000 to $24,999'] +  joined_census_data['$25,000 to 29,999'] + joined_census_data['$30,000 to $34,999'] + joined_census_data['$35,000 to $39,999']
In [17]:
joined_census_data['PCT_Low_Income'] = joined_census_data['Low_Income']/joined_census_data['Households']*100
In [18]:
ax_li = joined_census_data.plot(figsize=(12,10),
                         column='PCT_Low_Income',
                         legend=True,
                         cmap='OrRd',
                         scheme='equal_interval',
                         linewidth=0.2,
                         edgecolor='grey',
                         alpha=0.8,
                         legend_kwds={
                              'loc': 'upper right',
                              'bbox_to_anchor':(1,1),
                              'title': '% Low Income'
                         })
cx.add_basemap(ax_li, source=cx.providers.CartoDB.Positron)
plt.title(label="Surrounding tracts by percentage low-income households (under $40,000)",
          fontsize=14,
          color="black");
In [19]:
joined_census_data['PCT_Take_Transit'] = joined_census_data['Public Transportation (Includes Taxicab)']/joined_census_data['Workers 16 years and over']*100
In [20]:
ax_tr = joined_census_data.plot(figsize=(12,10),
                         column='PCT_Take_Transit',
                         legend=True,
                         cmap='OrRd',
                         scheme='equal_interval',
                         linewidth=0.2,
                         edgecolor='grey',
                         alpha=0.8,
                         legend_kwds={
                             'loc': 'upper right',
                             'bbox_to_anchor':(1,1),
                             'title': '% Workers Taking Transit'
                         })
cx.add_basemap(ax_tr, source=cx.providers.CartoDB.Positron)
plt.title(label="Surrounding tracts by percentage of workers who take public transport",
          fontsize=14,
          color="black");

Highest Transit Ridership Tracts¶

We were particularly interested in the mode share of households in the area, and so we determined the five tracts that had the most transit ridership along each bus line. In this file, we show these for the 51A line. We first load the Census data and the dataset containing the geometries and join the two together.

In [21]:
census_data_51a = pd.read_csv("data/51a-tract-data/tracts_data_51a.csv")
geom_data_51a = pd.read_csv("data/51a-tract-data/tracts_51A.csv")
geom_data_51a = geom_data_51a.rename(columns={"GEOID": "FIPS"})
geom_data_51a = geom_data_51a.loc[:, ["FIPS", "geometry"]]
data_51a = census_data_51a.merge(right=geom_data_51a, on="FIPS")

We then make this dataframe a geodataframe.

In [22]:
data_51a["geometry"] = gpd.GeoSeries.from_wkt(data_51a["geometry"])
data_51a = gpd.GeoDataFrame(data_51a, geometry="geometry")

We updated the headers of these columns as they were very long.

In [23]:
modes = ["Car, Truck, or Van", "Drove Alone", "Carpooled", "Public Transportation (Includes Taxicab)", "Motorcycle", "Bicycle", "Walked", "Other Means"]

for i in range(8):
    data_51a = data_51a.rename(columns={"Workers 16 Years and Over: " + modes[i]: modes[i]})

We first determine these top five tracts.

In [28]:
top_5_transit_tracts = data_51a.sort_values(by="Public Transportation (Includes Taxicab)", ascending = False).head(5)

We first plot the count of residents using public transit using pandas's bar.h function.

In [29]:
top_5_transit_tracts.plot.barh(x="FIPS",
                            y="Public Transportation (Includes Taxicab)",
                            xlabel="Count",
                            title="Top surrounding tracts with most residents using public transit",
                            legend=False);

As we learned about the plotly library the following week, we improved upon the chart by making an interactive bar chart that also showed the overall mode share of these tracts, with the interactivity allowing a viewer to see the specific counts for each mode. We converted our dataframe to be in a "long" dataframe format to work with plotly's bar function.

In [30]:
top_5_transit_tracts['Area Name'] = top_5_transit_tracts['Area Name'].str.replace("Census Tract ", "")
In [31]:
top_5_transit_tracts_long = pd.melt(top_5_transit_tracts, id_vars='Area Name', value_vars=modes)
In [40]:
fig_transit_tracts = px.bar(top_5_transit_tracts_long, x="Area Name", y="value", color="variable", 
                            labels={
                                "value": "Count",
                                "variable": "Transportation mode",
                            },
                            title="Mode share in tracts with highest transit ridership")

fig_transit_tracts.update_layout(xaxis_title="Census Tract")
fig_transit_tracts.show()

Bus performance¶

On-time performance¶

We reached out to AC Transit for data on delays and performance. They shared with us data on on-time performance for these three lines, which we have ploted below.

In [41]:
otp_19 = pd.read_csv('data/on-time performance/On Time Performance_Line 19 (2022).csv')
otp_20 = pd.read_csv('data/on-time performance/On Time Performance_Line 20 (2022).csv')
otp_51A = pd.read_csv('data/on-time performance/On Time Performance_Line 51A (2022).csv')
In [42]:
def plot_otp(df, i, line):
    df['Eastbound OTP'] = df['Eastbound OTP']*100
    df['Westbound OTP'] = df['Westbound OTP']*100
    
    f = df.plot(ax=axes_otp[i], rot=90, legend=False, figsize=(10,5))
    f.set_ylim(ymin=0, ymax=100)

    if i == 0:
        f.set_ylabel("Percent")
    else:
        f.set_yticks([])
        
    f.set_xticks(np.linspace(0,11,12), labels=["Jan", "Feb", "Mar", "Apr", "May", "June", "July", "Aug", "Sep", "Oct", "Nov", "Dec"])
    f.set_title("Line " + line)
    
    if i == 2:
        f.legend(loc = 'lower right')
In [43]:
fig_otp, axes_otp = plt.subplots(nrows=1, ncols=3)
plot_otp(otp_19, 0, "19")
plot_otp(otp_20, 1, "20")
plot_otp(otp_51A, 2, "51A")
fig_otp.suptitle("AC Transit real-time performance");

Real-time delays¶

In [ ]:
 

Division of work¶

In [ ]:
 
In [ ]: